PROBIT_MODEL
Overview
The PROBIT_MODEL function fits a binary probit regression model using maximum likelihood estimation (MLE). Probit regression is a type of discrete choice model used to analyze binary outcomes—situations where the dependent variable takes one of two values (0 or 1), such as purchase decisions, event occurrences, or pass/fail outcomes.
Unlike logistic regression, which uses the logistic cumulative distribution function, probit regression models the probability of the binary outcome using the cumulative distribution function (CDF) of the standard normal distribution. The probability that the outcome equals 1 is given by:
P(Y = 1 | X) = \Phi(X\beta)
where \Phi(\cdot) is the standard normal CDF, X is the matrix of independent variables, and \beta is the vector of coefficients to be estimated.
This implementation uses the statsmodels library, specifically the Probit class from the statsmodels.discrete.discrete_model module. For source code and additional details, see the statsmodels GitHub repository.
The function returns parameter estimates (coefficients), standard errors, z-statistics, p-values, and confidence intervals for each predictor. It also provides model fit statistics including McFadden’s pseudo R-squared, log-likelihood, AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion), and the likelihood ratio test p-value. McFadden’s pseudo R-squared is calculated as:
\text{Pseudo } R^2 = 1 - \frac{\log L_{\text{full}}}{\log L_{\text{null}}}
where \log L_{\text{full}} is the log-likelihood of the fitted model and \log L_{\text{null}} is the log-likelihood of the null model (intercept only).
Probit models are widely used in economics, social sciences, and biostatistics for analyzing binary choice behavior. For comprehensive references on discrete choice models, see Cameron & Trivedi’s Regression Analysis of Count Data (1998) and Greene’s Econometric Analysis (2003).
This example function is provided as-is without any representation of accuracy.
Excel Usage
=PROBIT_MODEL(y, x, fit_intercept, alpha)
y(list[list], required): Column vector of binary dependent variable values (0 or 1)x(list[list], required): Matrix of independent variables (predictors), one column per predictorfit_intercept(bool, optional, default: true): If true, adds an intercept term to the modelalpha(float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1)
Returns (list[list]): 2D list with probit results and statistics, or error string.
Examples
Example 1: Basic single predictor model
Inputs:
| y | x |
|---|---|
| 0 | 1 |
| 0 | 1.2 |
| 0 | 1.5 |
| 0 | 1.8 |
| 0 | 2 |
| 1 | 2.2 |
| 0 | 2.5 |
| 1 | 2.8 |
| 0 | 3 |
| 1 | 3.2 |
| 1 | 3.5 |
| 1 | 3.8 |
| 1 | 4 |
| 1 | 4.2 |
| 1 | 4.5 |
Excel formula:
=PROBIT_MODEL({0;0;0;0;0;1;0;1;0;1;1;1;1;1;1}, {1;1.2;1.5;1.8;2;2.2;2.5;2.8;3;3.2;3.5;3.8;4;4.2;4.5})
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | -4.466 | 2.137 | -2.09 | 0.0366 | -8.654 | -0.2784 |
| x1 | 1.704 | 0.7943 | 2.145 | 0.03196 | 0.1469 | 3.261 |
| pseudo_r_squared | 0.5779 | |||||
| log_likelihood | -4.374 | |||||
| aic | 12.75 | |||||
| bic | 14.16 | |||||
| llr_pvalue | 0.000538 |
Example 2: Model without intercept
Inputs:
| y | x | fit_intercept |
|---|---|---|
| 0 | 1 | false |
| 0 | 1.2 | |
| 0 | 1.5 | |
| 0 | 1.8 | |
| 0 | 2 | |
| 1 | 2.2 | |
| 0 | 2.5 | |
| 1 | 2.8 | |
| 0 | 3 | |
| 1 | 3.2 | |
| 1 | 3.5 | |
| 1 | 3.8 | |
| 1 | 4 | |
| 1 | 4.2 | |
| 1 | 4.5 |
Excel formula:
=PROBIT_MODEL({0;0;0;0;0;1;0;1;0;1;1;1;1;1;1}, {1;1.2;1.5;1.8;2;2.2;2.5;2.8;3;3.2;3.5;3.8;4;4.2;4.5}, FALSE)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| x1 | 0.158 | 0.1199 | 1.318 | 0.1875 | -0.07696 | 0.393 |
| pseudo_r_squared | 0.08651 | |||||
| log_likelihood | -9.467 | |||||
| aic | 20.93 | |||||
| bic | 21.64 | |||||
| llr_pvalue |
Example 3: Custom alpha for confidence intervals
Inputs:
| y | x | alpha |
|---|---|---|
| 0 | 1 | 0.1 |
| 0 | 1.5 | |
| 0 | 2 | |
| 1 | 2 | |
| 0 | 2.5 | |
| 1 | 2.5 | |
| 1 | 3 | |
| 0 | 3 | |
| 1 | 3.5 | |
| 1 | 4 | |
| 1 | 4.5 | |
| 1 | 5 |
Excel formula:
=PROBIT_MODEL({0;0;0;1;0;1;1;0;1;1;1;1}, {1;1.5;2;2;2.5;2.5;3;3;3.5;4;4.5;5}, 0.1)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | -2.976 | 1.746 | -1.704 | 0.0883 | -5.849 | -0.104 |
| x1 | 1.193 | 0.6663 | 1.79 | 0.07338 | 0.09701 | 2.289 |
| pseudo_r_squared | 0.3937 | |||||
| log_likelihood | -4.942 | |||||
| aic | 13.88 | |||||
| bic | 14.85 | |||||
| llr_pvalue | 0.0113 |
Example 4: Multiple predictors with all arguments
Inputs:
| y | x | fit_intercept | alpha | |
|---|---|---|---|---|
| 0 | 1 | 1.5 | true | 0.05 |
| 0 | 1.5 | 2 | ||
| 0 | 2 | 1 | ||
| 0 | 2.5 | 2.5 | ||
| 0 | 2 | 3 | ||
| 1 | 2.2 | 2.8 | ||
| 0 | 3 | 2 | ||
| 1 | 2.8 | 3.2 | ||
| 0 | 3.2 | 2.5 | ||
| 1 | 3 | 3.5 | ||
| 0 | 3.5 | 3 | ||
| 1 | 3.5 | 3.8 | ||
| 1 | 4 | 3.5 | ||
| 1 | 4 | 4 | ||
| 1 | 4.5 | 4.2 | ||
| 1 | 4.2 | 4.5 | ||
| 1 | 5 | 4.8 | ||
| 1 | 5.5 | 5 |
Excel formula:
=PROBIT_MODEL({0;0;0;0;0;1;0;1;0;1;0;1;1;1;1;1;1;1}, {1,1.5;1.5,2;2,1;2.5,2.5;2,3;2.2,2.8;3,2;2.8,3.2;3.2,2.5;3,3.5;3.5,3;3.5,3.8;4,3.5;4,4;4.5,4.2;4.2,4.5;5,4.8;5.5,5}, TRUE, 0.05)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | -11.02 | 6.889 | -1.6 | 0.1095 | -24.53 | 2.477 |
| x1 | -0.5517 | 1.011 | -0.5459 | 0.5851 | -2.533 | 1.429 |
| x2 | 4.185 | 2.624 | 1.595 | 0.1107 | -0.9579 | 9.327 |
| pseudo_r_squared | 0.75 | |||||
| log_likelihood | -3.091 | |||||
| aic | 12.18 | |||||
| bic | 14.85 | |||||
| llr_pvalue | 0.00009379 |
Python Code
import numpy as np
from statsmodels.discrete.discrete_model import Probit as statsmodels_probit
def probit_model(y, x, fit_intercept=True, alpha=0.05):
"""
Fits a binary probit regression model using maximum likelihood estimation.
See: https://www.statsmodels.org/stable/generated/statsmodels.discrete.discrete_model.Probit.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Column vector of binary dependent variable values (0 or 1)
x (list[list]): Matrix of independent variables (predictors), one column per predictor
fit_intercept (bool, optional): If true, adds an intercept term to the model Default is True.
alpha (float, optional): Significance level for confidence intervals (between 0 and 1) Default is 0.05.
Returns:
list[list]: 2D list with probit results and statistics, or error string.
"""
def to2d(arr):
return [[arr]] if not isinstance(arr, list) else arr
def validate_numeric_2d(arr, name):
arr = to2d(arr)
if not isinstance(arr, list):
return f"Invalid input: {name} must be a 2D list."
if not all(isinstance(row, list) for row in arr):
return f"Invalid input: {name} must be a 2D list."
flat = []
for row in arr:
for val in row:
if not isinstance(val, (int, float)):
return f"Invalid input: {name} must contain only numeric values."
if np.isnan(val) or np.isinf(val):
return f"Invalid input: {name} contains non-finite values."
flat.append(float(val))
return arr, flat
def validate_alpha(alpha_val):
if not isinstance(alpha_val, (int, float)):
return "Invalid input: alpha must be a number."
alpha_val = float(alpha_val)
if np.isnan(alpha_val) or np.isinf(alpha_val):
return "Invalid input: alpha must be finite."
if alpha_val <= 0 or alpha_val >= 1:
return "Invalid input: alpha must be between 0 and 1."
return alpha_val
# Validate inputs
y_result = validate_numeric_2d(y, "y")
if isinstance(y_result, str):
return y_result
y_2d, y_flat = y_result
x_result = validate_numeric_2d(x, "x")
if isinstance(x_result, str):
return x_result
x_2d, x_flat = x_result
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
alpha_val = validate_alpha(alpha)
if isinstance(alpha_val, str):
return alpha_val
# Check y is a column vector
if len(y_2d[0]) != 1:
return "Invalid input: y must be a column vector (single column)."
# Check y contains only 0s and 1s
for val in y_flat:
if val not in (0.0, 1.0):
return "Invalid input: y must contain only binary values (0 or 1)."
# Get number of observations and predictors
n_obs = len(y_flat)
n_rows_x = len(x_2d)
if n_rows_x != n_obs:
return "Invalid input: x and y must have the same number of rows."
# Check all rows in x have the same number of columns
n_cols_x = len(x_2d[0]) if x_2d else 0
if not all(len(row) == n_cols_x for row in x_2d):
return "Invalid input: all rows in x must have the same length."
# Convert to matrix form
try:
y_array = np.array(y_flat).reshape(-1, 1)
x_array = np.array(x_2d)
# Add intercept if requested
if fit_intercept:
x_array = np.column_stack([np.ones(n_obs), x_array])
# Fit probit model
model = statsmodels_probit(y_array, x_array)
result = model.fit(disp=0)
# Get confidence intervals
conf_int = result.conf_int(alpha=alpha_val)
# Build output table
output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper']]
# Parameter names
param_names = []
if fit_intercept:
param_names.append('intercept')
for i in range(n_cols_x):
param_names.append(f'x{i+1}')
# Helper to convert NaN/inf to empty string
def safe_float(val):
fval = float(val)
return '' if (np.isnan(fval) or np.isinf(fval)) else fval
# Add parameter results
for i, param_name in enumerate(param_names):
output.append([
param_name,
safe_float(result.params[i]),
safe_float(result.bse[i]),
safe_float(result.tvalues[i]),
safe_float(result.pvalues[i]),
safe_float(conf_int[i, 0]),
safe_float(conf_int[i, 1])
])
# Add model statistics
output.append(['pseudo_r_squared', safe_float(result.prsquared), '', '', '', '', ''])
output.append(['log_likelihood', safe_float(result.llf), '', '', '', '', ''])
output.append(['aic', safe_float(result.aic), '', '', '', '', ''])
output.append(['bic', safe_float(result.bic), '', '', '', '', ''])
output.append(['llr_pvalue', safe_float(result.llr_pvalue), '', '', '', '', ''])
return output
except Exception as exc:
return f"statsmodels.discrete.discrete_model.Probit error: {exc}"